트랜스포머 아키텍쳐의 오리진 스토리
- 2025-09-24
트랜스포머 아키텍쳐의 오리진 스토리. Supremacy: AI, ChatGPT, and the Race that will Change the World 9장의 주요 내용.
Illia Polosukhin, Ashish Vaswani, Jakob Uszkoreit. “Recurrent” 없는 RNN:
The transformer’s invention in 2017 was about as impactful to the field of AI as the advent of smartphones was for consumers. …
The culture of complacency partly came from having so many talented scientists on staff, like Geoffrey Hinton. The bar was high, and Google was already using cutting-edge AI techniques, like RNNs, to process billions of words of text every day.
If you were a young AI research like Illia Polosukhin, you were sitting next to the people who’d invented these techniques. In early 2017, (he) was spitballing with two other researchers, Ashish Vaswani and Jakob Uszkoreit.
Researchers like Vaswani had been looking into the concept of “attention” in AI, which is then a computer can pick out the most important information in a dataset. Over their salads and sandwiches, the trio wondered if they could use that same technique to translate words more quickly and accurately. …
They were talking about removing the “recurrent” element of RNNs, which was crazy. …
Noam Shazeer 합류:
Soon after he (Noam Shazeer) joined the ragtag group of researchers, Shazeer figured out some tricks that helped the new model work with large amounts of data. …
“트랜스포머”라는 이름:
Soon there were eight researchers working on the as-yet-unnamed project, writing code and refining the architecture of what they were calling the transformer. The name referred to a system that could transform any input into any output, and while the scientists focused on translating language, their system would eventually do far more. …
Llion Jones … was stunned to find that the system was doing something called coreference resolution. … “It’s a classic intelligence test [that] AI’s failed on.” Jones says. But when they ed those same sentences into the transformer, the researchers could see something unusual happening to its “attention head.”
Attention is all you need 저술 시작:
About six months after those first conversations over lunch, the researchers wrote up their findings. Polosukhin had already left Google, but everyone else kept the project going and stayed in the office till midnight to wrap everything up. Vaswani, who was the lead author, slept on a nearby couch overnight.
제목:
Jones looked up from his desk, nearby. “I’m not very good with titles,” he replied. “But how about ‘Attention is all you need’?” It was a random thought that had popped into his head, and Vaswani didn’t say anything in agreement. In fact, he got up and walked away, Jones recalls.
But later, the title “Attention Is All You Need” landed on the front page of their paper, a perfect summary of what they’d discovered.
느려터진 구글:
(Transformer) had the potential to supercharge AI systems, but Google was slow off the mark to do anything about them. It took several years, for instance, for Google to plug transformers into services like Google Translate or BERT, a LLM that it developed to make its search engine better at processing the nuance of human language. …
Google’s cautious approach was largely a product of bloat. The downside to being one of the largest companies of all time, with a monopolistic grip on the search market, is that everything moves at a snail’s pace. You’re constantly afraid of public backlash or regulartory scrutiny. Your prime concern is maintaining growth and dominance.
공저자 모두가 구글에서 퇴사:
Frustrated, Shazeer left Google in 2021 to pursue his research on LLMs independently, cofounding a chatbot company called character.ai. … Of the eight researchers who invented the transformer, all have now left Google.
OpenAI:
Soon, Google was going to experience what Ashish Vaswani describes as a “biblical moment.” As Google continued printing money from its advertising business, OpenAI was taking what looked like a monumemtal step toward AGI, and it wasn’t keeping anything under wraps.